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Abstract 

Social networks are rarely observed in full detail. In many situations properties 
are known for only a sample of the individuals in the network and it is desirable 
to induce global properties of the full social network from this "egocentric" network 
data. In the current paper we study a few different types of egocentric data, and show 
what global network properties are consistent with those egocentric data. Two global 
network properties are considered: the size of the largest connected component in the 
network (the giant), and secondly, the possible size of an epidemic outbreak taking 
place on the network, in which transmission occurs only between network neighbours, 
and with probability p. The main conclusion is that in most cases, egocentric data 
allow for a large range of possible sizes of the giant and the outbreak. However, there 
is an upper bound for the latter. For the case that the network is selected uniformly 
among networks with prescribed egocentric data (satisfying some conditions), the 
asymptotic size of the giant and the outbreak is characterised. 

Keywords: Network, giant component, epidemics, egocentric data 

1 Introduction 

Social network data may be of different levels of detail, e.g. complete (sociocentric), snow- 
ball sampled, egocentric with alter connections or ego-only egocentric [7]. Here egocentric 
data means that information of the immediate surrounding of a sample of actors are col- 
lected. More precisely, following Hanneman and Riddle, we distinguish between ego-only 
egocentric data where the connections of each sampled actor (ego) is all that is collected, 
and egocentric with alter connections, where it is also observed which of these connec- 
tions are themselves connected. To observe the complete network in a large community 
is of course expensive and time consuming, which is the reason why data often consists 
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of snowball samples or egocentric data (e.g. [7j). Clearly, the higher level of detail in 
the collected data, the more can be inferred with higher precision j 12 j . However, as has 
been shown by Marsden [10] who studies betweenness, it is in some situations possible 
to infer also global network properties from egocentric data in a fairly robust way. Many 
social networks share the property known as transitivity (closely related to clustering) 
that if A is connected to B and B is connected to C, then it is more likely that A is 
also connected to C (e.g. |9]). To which extent this property is manifested in a given 
social network is obviously better known from egocentric data with alter connections as 
compared with ego-only egocentric data. As a consequence, global properties affected by 
transitivity/clustering should be easier to infer from the former type of data. 

In the current paper we investigate what can be deduced about global network properties 
when observing different sorts of egocentric data. More precisely, we focus on the (relative) 
size r of the largest connected component of the network (the giant) when all that is known 
is the mean degree of actors, when ego-only egocentric data are collected, and where 
egocentric data with alter connections are observed. Additional to this we study possibly 
scenarios for an epidemic spreading "on" the social network. The main conclusion is that 
very little can be said about r if only egocentric data of any type without additional 
information are observed. The same is however not true for epidemics occurring on 
the (more specified) social network: the more detailed information about the egocentric 
network the narrower is the range of possible outbreak sizes. We also study the size of 
the connected component and the epidemic outbreak size of a random or typical network 
with the prescribed egocentric properties. 

2 Network properties and epidemic model 

Consider a community, or social network, consisting of n individuals/actors, where n is 
assumed to be large. Each pair of actors i and j are either connected by an (undirected) 
edge or not, where the edge reflects some type of social relationship (liking, shared mem- 
bership of group or household, sexual relationship, ...). Let d^j = djj = 1 if % and j are 
connected and dij = dj^ = otherwise. Knowing dij for all i and j then corresponds to 
knowing the complete network. Knowing ego-only egocentric data means that we only 
know di = Yljd>i,j, the number of connections, or the degree, of all or a sample of the 
actors. This means that we know the degree distribution in the community. Below we 
will also study the situation where we have even less information, i.e. where all that is 
observed is the mean degree fio = 'Yu i di/n. Thus, in the first situation we know what 
fraction po that has degree 0, what fraction p\ that has degree 1 and so on (i.e. we know 
the degree distribution {pk}), whereas in the latter case we only know that the mean 
degree equals a certain number /i/> Knowing the degree distribution will also give the 
mean degree by the relation fi D = ^ fc kp k . 

If we have egocentric data with alter connections we also know which connections of an 
actor are connected themselves. That is, if d it j = 1 and dij. = 1 we observe whether 
dj^k = 1 or not. Later we will simplify this type of data to the situation that we specify 
two degrees of each actor: the single degree and the triangle degree, where the single 
degree is the number of actors "ego" is connected to that are not connected to any other 
acquaintance of ego, and the triangle degree denotes how many triangles ego is part of 
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So for example, in Figure [T] ego (actor 1) is connected to 5 actors, 3 which together with 
ego all know each other and 2 separate actors that, each of them, don't know anyone else 
of ego's neighbours. So, ego has single degree 2 and triangle degree 3 (there are three ways 
to chose 2 out the 3 common friends) Admittedly, by reducing the egocentric data with 




Figure 1: A mini graph in which actor 1 has single degree 2 and triangle degree 3 

alter connections to only keeping track of actors single and triangle degree we loose some 
information. The reason for doing this is that it makes the mathematical analysis more 
tractable and it is our hope that it will have only minor effect on the results. To conclude, 
we represent the egocentric data with alter connections by the single and triangle degree 
distribution {pk ± ,k A }, where Pk ± ,k A denotes the fraction of actors that have single degree 
k\ and triangle degree k/±. 

The first global property we investigate is the relative size of the largest connected com- 
ponent t. Individuals in the network are said to be directly connected if there is an edge 
between the actors, and in general nodes are said to be connected if there is a path of 
directly connected actors between them. The network can hence be decomposed into 
separate connected components, and the largest (in terms of number of actors) of these 
components is called the giant component. The relative size r is the size (the number of 
actors) of the giant divided by the population size. 

As mentioned earlier we will also see what the effect of an infectious disease spreading in 
the community, i.e. "on" the social network, is, in the sense that an infected actor may 
infect any of its (not yet infected) connected actors but no one else. We assume that the 
epidemic is initiated by one uniformly at random selected index case, and that anyone 
that gets infected infects each of its susceptible neighbours independently with probability 
p (those who get infected may spread the disease to their not yet infected neighbours, 
and so on). This model is known as the Reed- Frost epidemic model on a network. It is 
well-known that, for such epidemics taking place in a large community, two qualitatively 
different things may happen. Either only few actors will get infected (a minor outbreak) 
or else a positive (hardly random) fraction will get infected; we say a major outbreak has 
occurred (e.g. [T^ [2]). Another known fact for these class of models is that the probability 
7r of observing a major outbreak equals the relative size r* 6 ^ of the major outbreak, 
i.e. 7r = T^ ept \ In the current paper we are mainly interested in the relative size of the 
epidemic, but for the reason just mentioned we may equally well compute the probability 
7r of having a major epidemic outbreak. The case p = 1 implies that the disease spreads to 
the whole connected component of the index case. The probability tt of a major outbreak 
is then equal to the probability that the index case belongs to the giant component, and 
because the index case is chosen uniformly at random this is the same as the relative size 
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r of the giant. We hence have r^ ep ^ = r when p — 1. 

In the next sections we investigate what the range of possible relative sizes of the giant 
connected component is, assuming that certain local features of the social network are 
given. We also study what effect an epidemic taking place on the social network may 
have. We start by only assuming that the mean degree /i D is known, and then gradually 
assume more informative egocentric data. 

3 Observing only mean degree 

Suppose first that all we know about the network is that mean degree equals /ip (0 < 
fio < oo). First we study properties of the largest connected component, and then the 
size of an epidemic outbreak occurring on the social network. 

3.1 The giant connected component 

What might the relative size of the giant connected component r be? Since very lit- 
tle is fixed (only the mean degree /ip) we may choose rather freely in order to max- 
imise/minimise r. If we want to minimise r we simply make small fully connected and 
isolated components of size L/-£dJ and \{id] where [fio\ is the integer part of fio and 
\[Id\ is the smallest integer, which is larger than or equal to fi D . If fi D is an integer, say 
fin = 5, we simply group actors into fully connected groups of size 6 (actors then have 
degree 5). As a consequence, all connected components have size 6 and the relative size 
of the largest connected component is 6/n ~ implying that 

If we instead want to maximise r, this is achieved (among other ways) by connecting all 
actors with degree 2 or more into one giant component (first make a "line" out of all actors 
and then add edges arbitrarily). We hence want to maximise the fraction having degree 
2 or more. If /i D > 2 all actors can have degree larger than or equal to 2, which implies 
that all actors may be connected into one giant component, i.e. r max = 1. If fi^ < 2 we 
have to "sacrifice" a fraction of the actors such that the remaining actors all have degree 
2. (We can connect two degree 1 actors to the end points of the line, but since we assume 
that the population is large, those two actors have only marginal effect on the size of the 
giant.) More specifically, we let a fraction 1 — /i£>/2 have degree and the remaining 
fraction /j,d/2 have degree two and putting the latter in one long line. The size of the 
largest connected component then equals r max = ^d/^- To conclude, if /ip > 2 then 
Tmax = 1 and otherwise r max = fi D /2. 

This feature, that r mm = and r max = 1 or close to 1 (if the mean degree is large 
enough) will be repeated also when we observe the ego-only egocentric network or with 
alter connections. 

We now pick a network at random among all networks having mean degree ji£,- Having the 
mean degree fixed and equal to fi D is identical to having the total number of edges constant 
and equal to n//£>/2 (the denominator 2 comes from the fact that each edge contributes 
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to the degree of two different nodes). Choosing a network with n nodes and m = n/j,u/2 
edges uniformly at random is a well-known model of Erdos-Renyi denoted G(n,m) and 
it is well-known that this network is (for our purposes) asymptotically equivalent to the 
more familiar G(n,p = fio/n) network of Erdos and Renyi in which edges appear between 
different pairs of nodes independently, each with probability p,£>/n [TJ [5|. 

The relative size TR an d of the largest connected component in this network is given by the 
largest solution TRand — t to the equation 

l-t = e-» Dt . (1) 

See Figure [2] for an illustration of how TR an d = t depends on fip. It is also known that 
TRand is strictly positive if and only if /x^ > 1 [6], [Tj, [5]. 

T Rand 
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Figure 2: The relative size TR an d of the largest connected component in an Erdos-Renyi 
network. 



3.2 Epidemic outbreak size 

Suppose now that we are interested in the potential spread of an infectious disease taking 
place on the social network having mean degree fj,£>. More precisely, we assume that the 
transmission model is as defined above, with transmission probability p (0 < p < 1), and 
that the mean degree /id is all that is given about the social network. 

How big can a major outbreak be? Having transmission probability p means that we erase 
each existing edge with probability 1 — p and keep it with probability p. This will have 
the effect of possibly breaking up the original largest connected component, but never 
making it bigger. As a consequence, we still have 



since it was shown in the previous section that r min = 0. 

Because the epidemic has the effect of removing some edges (often denoted thinning) it is 
very unlikely that everyone gets infected. How do we maximise the size of a major out- 
break for given fip and transmission probability pi Let us first consider the case where 

fiD is a multiple of 2. One choice of network that maximises the outbreak size/probability 

T (epi) is 

then to let /zd/2 (an integer) number of actors each be connected to every other 
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actor (we call them central nodes), and the remaining n — //d/2 actors each only be- 
ing connected to these central actors (see Figure [3] for an illustration of this "starlike" 
construction for the case p,£> = 4). The mean degree of this network equals 

Vd/2 . n- fin/2 

[n - 1) H Hd/2 « \id, 

n n 

the approximation relying on n to be large. 




Figure 3: Illustration of a large network having mean degree \xd = 4 that maximises the 
probability and size of a major outbreak. The relative outbreak size in case of a major 
outbreak equals ri e ^ = 1 — (1 — p) 2 where p is the transmission probability. 

To compute the probability it of a large epidemic outbreak for this network is straightfor- 
ward, and as before we have n = r^ epi \ the relative size of a major outbreak. The index 
case is selected randomly; most likely it is hence one of the nodes having degree //d/2. 
However, if this actor infects at least one of its neighbours, then a major outbreak will 
certainly occur since all of its neighbours are central actors, each with degree n — 1. The 
probability that the actor infects at least one neighbour is 1 — (1 — p)^°l 2 which hence 
equals n = r^ epi \ This reasoning is easily extended to the case that fJ,£>/2 is not an integer. 
To this end let /ip/2 = |_/4d/2J + a where |_/-*d/2J is the integer-part of /^d/2 and a the 
remainder. Then there should be \_fi£>/2\ central nodes, each connected to all other nodes, 
and one node connected to an other nodes (if |_A*d/2J =0 this means a fraction 1 — a of 
the nodes are isolated and the remaining fraction a form a star). In order to compute the 
probability (=relative size) of an outbreak we then have to condition on if our selected 
index case was connected to |_/Ad/2J or \jijj/2\ + 1 nodes. The resulting expression for 
the probability/size of an outbreak, also valid for the case where Hd/2 is an integer (or 
equivalently a = 0), is then given by: 

T<g2 = (1 - a) (1 - (1 - p) k ) + a (l - (1 - p) k+l ) , (2) 
where k = |_/io/2j and a = fin/2 — k. 

Finally we treat the size of an epidemic outbreak in a randomly selected network among 
all networks having mean degree \xb- As mentioned earlier, such a network corresponds 
to the Erdos-Renyi network and an the epidemic on the network corresponds to having 
the Reed-Frost epidemic model (e.g.|2j[5]) with transmission probability pfip/n between 
each pair of actors. If an epidemic occurs on this network the final size Tj^ d is given by 
the largest solution r^^ d = t to the equation 

1 - t = e-^ Dt . (3) 
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Note that when p = 1 this equation coincides with Equation (Til) as to be expected. It is 
also known that r^^ d is strictly positive if and only if pfip > 1 (e.g. j!4L |2]). In (3) it is 
seen that T^nd onr y depends on the product p/ip and not on the separate components. 
In Figure [2] we illustrate this dependence, with jj,D playing the role oipnu- 



4 Observing egocentric data: ego only 

We now consider the case that we observe egocentric data, i.e. we observe the degree of a 
sample or all of the actors in the network. In case of a sample we neglect the uncertainty 
stemming from not knowing the exact degree distribution. We hence assume that we 
know the degree distribution {pk}, where pk is the probability that a randomly selected 
actor has degree k. 



4.1 The giant connected component 

Just as in the previous section it is easy to construct a network consisting of small com- 
pletely connected isolated units, thus achieving r min = 0. Similarly, it is possible to join 
all actors having degree 2 or larger into one single giant connected component by putting 
them in a line, actors with degree 1 can the be connected to actors having degree larger 
than 2 in the line. As a consequence, the size of the giant connected component is at least 
as large as the community fraction having degree 2 or larger, i.e. 

Tmax > 1 - (PO + Pi)- ( 4 ) 

This can be made even larger by connecting the degree 1 actors to the actors which 
have degree larger than 2. The mean number of actors those "large-degree" actors still 
have freedom to chose as neighbours is vp-pi _ 2 jf this number exceeds pi, then 

Tmax > 1 ~Po- Otherwise r max = 1 - (p + Pi) + i^^/j - 2 - 

Having solved r min and r max we now look at the case where we choose our network uni- 
formly at random among all networks having degree distribution {pk}- This is in fact 
exactly what is done in the configuration model (e.g. [5J H3J E]) where actors are given 
i.i.d. degrees according to the degree distribution {pk} and edges of nodes are connected 
completely at random (this may of course lead to self-loops and multiple edges but it is 
known, [5], that the fraction of such edges are negligible when fi£> < 00, so they may be 
removed without affecting the limiting degree distribution). 

The relative size of the giant connected component, TR an( i, in a network constructed using 
the configuration model has already been derived (e.g. [T¥l 15]). Let 

= skpk 

k 

denote the probability generating function of the degree distribution and p'(s) its deriva- 
tive. Let t = f denote the largest solution to the equation 
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Given the solution r (which will lie in [0, 1)), our quantity of interest, r^nd) is given by 

TRand = 1 - P(l -T). (5) 

It is also known that 

TRand > if and only if R G := //p + — > 1, 

where a 2 D is the variance of the degree distribution. In Figure [4] we plot TRand as a function 
of /Ad having fixed standard deviation <Jr> or Rq, and as a function of <jd or Rq having 
fixed mean degree fi£>, where D has a negative binomial distribution. 



r Rand T Rand 




Figure 4: Illustration of how TR ana - depends on the mean jj^ of the degree distribution, 
with a 2 D = 2 fixed (a), with Rq = 2 fixed (b) and how TR an d depends on Rq (c) or on 
the variance a 2 D of the degree distribution (d) with //£> = 1 fixed. Here D has a negative 
binomial distribution. 

The case where only the mean degree is fixed (Section |3| and a randomly selected network 
is chosen corresponds to the case where the degree distribution is Poisson with mean 

4.2 Epidemic outbreak size 

We now look at what can happen with an epidemic (with transmission parameter p) oc- 
curring on a network having degree distribution {pk}- Adding the epidemic, i.e. removing 
edges with probability 1 — p, will of course only make the size of the largest connected com- 
ponent smaller. So, as in the previous section, the minimal size of the largest connected 
component is still 0: = 0. 
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The corresponding maximisation problem is more involved. It is intractable to characterise 
how to construct a network with fixed degree distribution {pk} such that the epidemic 
outbreak size is maximal. Instead we illustrate the construction for one particular (simple) 
degree distribution: p 2 = 1 — p 3 = 0.6, i.e. that 60% of all nodes have degree 2 and the 
remaining half have degree 3, implying that fio — 2-4. 

The question is hence how we should connect nodes in order to maximise the size of the 
largest connected component in the thinned network (corresponding to the epidemic). 
It is obvious that we should avoid short loops because these will only reduce spreading 
since then some potential infectious contacts will be with already infected people. The 
remaining question is therefore how to connect 2-nodes (and 3-nodes respectively) to 
other actors. We extend the configuration model in the following way (knowing that 
this will result in a network without clustering). Distribute the degrees of actors as in 
the configuration model (i.e. i.i.d. degrees each having degree 2 with probability 0.6 and 
otherwise having degree 3). We now let each edge of a 2-node select an edge among 
the other 2-nodes with probability r (0 < r < 1) and with 3-nodes with the remaining 
probability 1 — r. In order for the total number of edges to match it follows that this 
implies that edges/stubs of 3-nodes should select stubs of other 3-nodes with probability 
r as well and stubs of 2-nodes with probability (1 — r). The parameter r, which can be 
interpreted as the fraction of all connections to actors with the same degree, may be freely 
chosen in order to maximise the size of the giant. The parameter r is closely related to 
the degree correlation: if r is small we have negative degree correlation whereas if r is 
large the degree correlation is positive. 

It is straightforward to (numerically) deduce the size of an outbreak for this epidemic 
model. Let 772 be the probability that an actor of degree 2, which itself is infected during 
the epidemic, will only generate a small number of further cases in the epidemic. Define 
773 similarly. Since an infected actor of degree 2 can infect only one other actor, which has 
degree 2 with probability r and degree 3 with probability 1 — r, we have 

t]2 — (1 — p) + prr] 2 + p(l - r)rj 3 . 

Here the 1 — p is the probability that the infected degree 2 actor will not infect other 
actors, while the pri] 2 (resp. p(l — r)r] 3 ) term denotes the probability that a degree 2 
(resp. 3) actor will get infected, but does not cause many further infections. Similarly we 
deduce that 

V3 = [(1 -p)+ p(l - r)r] 2 + prr] 3 ] 2 . 

From the theory on so-called branching processes [8], we know that we need the solution 
for which both r]2 and 773 are minimal. 

Similar arguments give that the probability that a uniformly at random chosen actor is 
part of a large outbreak, if the outbreak occurs is given by 

1 - T {epi \/j, D ,r,p) = 0.6[(1 -p) +prr] 2 +p(l - r)r) 3 ] 2 + 0.4[(1 -p) +p(l - r)r) 2 + prr) 3 ) 3 

If P < 1/2; then a large outbreak has probability 0, even if all actors would have had 
degree 3. If on the other hand p > 1/2, then a large outbreak is possible for some r. The 
r for which the outbreak size is maximised, r max is given in Figure [5j 

The qualitative conclusion from the example, also valid for other degree distributions 
{pk}, is hence that the size of the giant component r^ epi ' in the epidemic is maximised 
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Figure 5: The fraction r max of edges which connect actors of the same degree to each other, 
for which a large outbreak is maximised in a network in which 60% of the actors has degree 
2 and the other actors have degree 3, as a function of the transmission probability p (solid 
line). The dashed line gives the corresponding relative outbreak size r max as a function 
of p. 




when nodes with high degree are connected to other nodes with high degree (and low to 
low) when p (or more correctly pud) is small, and that 

T (epi) ig 

maximised by the opposite 

construction (low to high) in the case that pp,£> is large. 

Finally, the epidemic outbreak size in a randomly selected network having degree distri- 
bution {pk} and transmission probability p is obtained exactly as for the size of the giant 
connected component in the randomly selected network. The only difference comes from 
the fact that only a binomial number of the neighbours remain connected with an actor 
after having thinned the network. More precisely, as has been shown in e.g. [31 E], f is 
now the largest solution to 

i- ( = 4if*>. 
p'(i) 

And, given the solution f, our quantity of interest, Tj^ d , is then given by 

r { Z ] d = l- P (l-pf)- (6) 
Similar to before, it also holds that 

r^>0 if and only if R = p L D + °^r) > 1, 

where Ro denotes the basic reproduction number. 

If we know that a network is well described by a configuration model and we know the 
mean degree of the actors, fi£>, One further question to answer is: for which distribution 
{pk} is the size of a large outbreak maximised when the transmission probability equals 
pi In [4J it is shown that the answer to this question depends on p and /ip, but in all 
cases the degree distribution should be non-zero at at most 2 consecutive positive integer 
numbers and possible at degree 0. 
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5 Observing egocentric data with alter connections 



We end our analysis with the situation where the number of connections of all (or a 
sample of) actors are observed, and where it is also observed which of the connections of 
an actor are themselves connected. Such data, referred to as egocentric data with alter 
connections (e.g. p]), may often be collected in egocentric network surveys since egos are 
usually aware of this information. 

Such data gives the degree distribution in the community, but also, for each degree, the 
community frequency of having any given set of fully connected components of various 
sizes. As an example, one would know what fraction of the community that have degree 
6 where two connections are not connected to any other connection, and the remaining 
four are connected pairwise (forming 2 triangles with ego). As mentioned previously we 
simplify this type of data to knowing only the degree distribution and how many of the 
connections are not connected with others and how many are connected pairwise, with 
the implicit assumption that having larger fully connected components than triangles is 
unlikely (cf. The distribution is hence specified by {p(ki, k&)}, where p(ki,k&) is 

the probability that a randomly selected ego has k\ connections that are not connected to 
other connections of ego, and /ca pairs of connections that are also connected themselves 
pairwise. The total degree of such an actor is hence k\ + 2k&. 

5.1 The giant connected component 

As in the previous situations it is possible to construct a network consisting of only small 
connected components. Now that the number of triangles each actor belongs to is pre- 
specified, this is a bit more involved as it is no longer possible to join egos into fully 
connected components. However, it is possible to pick suitably many egos of a given 
degree pair (k\, /ca) such that they can form an isolated component; for (k\ = 2, k& = 1) 
it suffices with 9 egos to form an isolated component, (see Figure [6]). As a consequence, 
it is possible to construct a network without a giant component, so r min = 0. 



Figure 6: A component of a graph where all elements have degree pair {k\ — 2, k& — 1) 

Similarly, it is in most situations, possible to construct a network in which all egos are 
connected (i.e. r max = 1). This may not be the case if the degrees (of both sorts) are too 
small; then some egos have to be "sacrificed" just like before. We will not characterise 
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which degree distributions that allow for all egos being connected (i.e. r max = 1) and how 
large the giant may be if this is not the case. 

Now to the relative size of the largest connected component of a random network having 
the specified distribution {p(ki,k A )} of singleton neighbours and pairs of interconnected 
neighbours. For this we use results by Miller [11] who derives TRand for such a random 
network. The recipe is given in the next subsection for the special case where the trans- 
mission probability p equals 1. 



5.2 Epidemic outbreak size 

Removing edges due to no transmission will never increase the size of the giant component, 
so we still have = as for the case without the epidemic. 

Next we present how to derive the relative size of the giant of a randomly selected network 
having the prescribed degree and triangle distribution {p(ki,k A )} using methods from 
|11| . This is done by first solving 4 unknowns g\, g A , hi, h A from 4 equations. The 
interpretation of g\ and g A , are as the probability that a singleton edge, or triangle 
respectively, of a randomly selected node does not connect to the giant component, and 
h\ and h A are the probabilities that a node reached by a randomly selected singleton edge, 
or triangle respectively, does not connected to the giant from this edge/triangle. The four 
equations are: 



5-1 = 1 -p + phu 

hi = E( L D v Yl k Mki,k A )gi 1 ~ 1 g A A , 

^ x ' k lt k A 

9A = (1-P + ph A ? ~ 2/(1 - p)h A (l - h A ) 
h A = \ V k^k^k^g^g^ 1 . 



kl,k A 

These equations can be solved iteratively beginning with e.g. h\ = h A = thus giving the 
numerical solutions gi, hi, g A , h A . Given these solutions we have that the relative final 
size of a major epidemic outbreak in a random network with single- and triangle-degree 
distribution {p(ki, k A )} and with transmission probability p is given by 

42 = i- X)p(fci.^Ni A - (7) 

Further, T^nd ^ s strictly positive if and only if the basic reproduction number Rq exceeds 
the value of 1, and it is shown in [11] that Rq is the dominant eigenvalue of the 2 x 2-matrix 
M defined by 

/ pE(Df-Di) pE(Di£> A ) 

iV± \ 2p(l+p-p 2 )E(D 1 D A ) 2p(l+p-p 2 )E(Dl-D A ) 
\ S(Di) E(D A ) 

The corresponding result for the giant component of the original network is obtained by 
setting p = 1 in the equations above which reduces the number of equations to be solved 
iteratively from 4 down to 2. 
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Having derived T^^ d defined in ^ we can as before ask: is it possible to have a bigger 
epidemic outbreak than T^ an( i for fixed degree distribution {p{k\,k^)} and transmission 
probability p. The answer is "yes", as it was for the case with given singleton degree 
distribution and no triangles (cf. Section 4.2). In fact, a larger outbreak is possible to 



obtain if large-degree egos are connected to other large degree ego in the case that the 
transmission probability and mean degrees are small, and by connecting large-degree 
egos to small-degree egos when these quantities are large. To try to characterise exactly 
how this should be done for a given degree distribution {p(k\, k&)} is however not very 
instructive and is hence omitted. 

In Figure [7] we illustrate how the relative size of the giant component varies with the 
transmission probability p for the case where p(k\ — 0, &a = 1) = p(ki — 2, k& — 1) = 0.5, 
i.e. where all actors belong to one triangle and half of the actors have no other connections 
and the other half have two independent singleton edges on top of this. We plot both 
the case where the triangles are connected completely at random (zero degree correlation) 
and the extreme where triangles are always formed by connecting actors having the same 
degree. It is seen that positive degree correlation gives the largest outbreak size when p 
is small whereas zero degree correlation gives larger outbreak size when p is close to 1. 




P 



0.2 0.4 0.6 0.8 



Figure 7: Illustration of how r = r^ ep ^ varies with p for a given independent and triangle 
distribution, both the random case (dashed line) and the case where nodes of similar 
degree tend to be connected (solid line). 



6 Discussion 



In the paper it was described how large/small the giant connected component, as well as 
the size of an epidemic outbreak occurring on the social network, might be for some given 
information about the egocentric network. For all types of egocentric data it is possible 
not to have a giant (or epidemic outbreak) of the same order as the network. However, the 
upper bound on the size of the giant / outbreak decreases the more detailed egocentric data 
is available. For the epidemic larger outbreak than that of a randomly selected 

network among those consistent with the egocentric data, is obtained by connecting actors 
with high degree to low degree actors if the transmission probability p and the degrees are 
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large, and to connected actors with high degree to other actors if these numbers instead 
are small. That is, if the degrees and transmission probability are large, then we get a 
larger outbreak if the degree correlation is negative, and if the degree and transmission 
probability are small we get a larger outbreak if the degree correlation is positive. 

In the data form denoted egocentric with alter connection it was assumed that actors 
only had neighbours that were not connected to any other neighbour of ego, or else that 
were connected to exactly one other neighbour of ego. This is of course a simplification 
of real world networks (for example household are usually treated as a fully connected 
group of actors). It is an open question to see what effect such a deviation from the model 
assumption has on the network properties. 

In the paper we studied three different levels of detailed egocentric data: mean degree, 
degree distribution, and degree distribution including singleton and triangle degree. The 
only global properties treated were the size of the giant and of a possible epidemic outbreak 
in the community. There are many other global properties worthy of analysis under the 
same scenario, for example the diameter and betweeness of the network. 
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